Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Clustering ensemble algorithms based on improved genetic algorithm in cloud computing
XU Zhanyang, ZHENG Kezhang
Journal of Computer Applications    2018, 38 (2): 458-463.   DOI: 10.11772/j.issn.1001-9081.2017071749
Abstract429)      PDF (1036KB)(398)       Save
Considering the problem that unsupervised clustering lacks priori information about data classification, the accuracy of base clustering is affected by clustering algorithm and general clustering ensemble algorithm has high space complexity, a Clustering Ensemble algorithm based on Improved Genetic Algorithm (CEIGA) was proposed. Focusing on the issue that traditional clustering ensemble algorithms can not meet the time requirement of large scale data processing, a Parallel Clustering Ensemble algorithm based on Improved Genetic Algorithm (PCEIGA) using Hadoop for cloud computing was also proposed. Firstly, the base clustering partitions produced by base clustering generation mechanism were encoded as the initial population of the improved Genetic Algorithm (GA) after changing cluster labels. Secondly, the diversity of base clustering was ensured by improving the selection operator of GA. According to the improved selection operator, crossover operation and mutation operation were adopted on chromosomes and the next generation population was gotten by elitist strategy to ensure the accuracy of base clustering. By this way, the final results of clustering ensemble reached global optimum and the accuracy of the algorithm was improved. To improve the efficiency of the proposed algorithms, two MapReduce processes were designed and one Combine process was added to reduce the communication among nodes. Finally, CEIGA, PCEIGA and four advanced clustering ensemble algorithms were compared on UCI data sets. The experimental results show that CEIGA performs better than other advanced clustering ensemble algorithms, and PCEIGA can significantly reduce running time and improve algorithm efficiency without decreasing the accuracy of clustering results.
Reference | Related Articles | Metrics